Finding One's Best Crowd: Online Learning By Exploiting Source Similarity
نویسندگان
چکیده
We consider an online learning problem (classification or prediction) involving disparate sources of sequentially arriving data, whereby a user over time learns the best set of data sources to use in constructing the classifier by exploiting their similarity. We first show that, when (1) the similarity information among data sources is known, and (2) data from different sources can be acquired without cost, then a judicious selection of data from different sources can effectively enlarge the training sample size compared to using a single data source, thereby improving the rate and performance of learning; this is achieved by bounding the classification error of the resulting classifier. We then relax assumption (1) and characterize the loss in learning performance when the similarity information must also be acquired through repeated sampling. We further relax both (1) and (2) and present a cost-efficient algorithm that identifies a best crowd from a potentially large set of data sources in terms of both classifier performance and data acquisition cost. This problem has various applications, including online prediction systems with time series data of various forms, such as financial markets, advertisement and network measurement.
منابع مشابه
Learning to Rank Scientific Documents from the Crowd
Motivation: Finding related published articles is an important task in any science, but with the explosion of new work in the biomedical domain it has become especially challenging. Most existing methodologies use text similarity metrics to identify whether two articles are related or not. However biomedical knowledge discovery is hypothesis-driven. The most related articles may not be ones wit...
متن کاملGenre Ontology Learning: Comparing Curated with Crowd-Sourced Ontologies
The Semantic Web has made it possible to automatically find meaningful connections between musical pieces which can be used to infer their degree of similarity. Similarity in turn, can be used by recommender systems driving music discovery or playlist generation. One useful facet of knowledge for this purpose are fine-grained genres and their inter-relationships. In this paper we present a meth...
متن کاملA standard Interactive Multimedia eBook Generator Engine for e-Learning Process
Introduction: Using standard authoring tools is essential to promote E-Learning in teaching-learning process. Learning content in medical sciences often consists of multimedia elements. On the other hand, it is frequently required to revise and update the medical content. Hence, access to the authoring tools that can encompass multimedia elements and allow easy content revision is helpful in e-...
متن کاملExploiting Online Discussions in Collaborative Distributed Requirements Engineering
Large, distributed software development projects, like Open Source Software (OSS), adopt different collaborative working tools, including online forums and mailing list discussions that are valuable source of knowledge for requirements engineering tasks in software evolution, such as model revision and evolution. In our research, we aim at providing tool support for retrieving information from ...
متن کاملUtilizing Online Social Network and Location-Based Data to Recommend Products and Categories in Online Marketplaces
Recent research has unveiled the importance of online social networks for improving the quality of recommender systems and encouraged the research community to investigate better ways of exploiting the social information for recommendations. To contribute to this sparse field of research, in this paper we exploit users’ interactions along three data sources (marketplace, social network and loca...
متن کامل